Statistical Identification of Collocations in Large Corpora for Information Retrieval
نویسنده
چکیده
The linguistic phenomenon of collocation, the habitual juxtaposition of some words in natural language has been shown to benefit natural language processing tasks such as information retrieval. This paper examines the utility of several methods for collocation extraction for document retrieval, specifically for queries in question form.
منابع مشابه
Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus
Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Beng...
متن کاملRetrieving Collocations from Text: Xtract
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to ...
متن کاملINFO256 Project Report Implementation and Evaluation of Xtract in WordSeer
Natural languages are full of word collocations that frequently co-occur and correspond to arbitrary word usages. They appear in both technical and non-technical textual corpora and often have specific significance in individual contexts. Accurately retrieving and identifying collocations from a given corpus in an unsupervised manner is imperative to understanding and automatically generating t...
متن کاملDomain Collocation Identification
In this paper we present a new method of automatic collocation identification. Collocation is an important relation between words, which is widely used, among others, in information retrieval tasks. Over the last years, many methods of automatic collocation acquisition from text corpora have been proposed. The approach described in this paper differs from the others by focusing on domain colloc...
متن کاملCollocation Mining: Exploiting Corpora for Collocation, Identification and Representation
The work presented provides computational linguistics methods and tools for collocation identiication from arbitrary text, and methods and tools for representing collocations in a relational database integrating competence (collocation-type-speciic linguistic analysis) and performance information (corpus sentences). The work diiers from existing approaches to collo-cation identiication in syste...
متن کامل